General Query Expansion Techniques for Spoken Document Retrieval
نویسندگان
چکیده
This paper presents some developments in query expansion and document representation of our Spoken Document Retrieval (SDR) system since the 1998 Text REtrieval Conference (TREC-7). We have shown that a modification of the document representation combining several techniques for query expansion can improve Average Precision by relative to a system similar to that which we presented at TREC-7 [1]. These new experiments have also confirmed that the degradation of Average Precision due to a Word Error Rate (WER) of is relatively small (around 2% relative). We hope to repeat these experiments when larger document collections become available to evaluate the scalability of these techniques.
منابع مشابه
Effects of Query Expansion for Spoken Document Passage Retrieval
One of the major challenges for spoken document retrieval is how to handle speech recognition errors within the target documents. Query expansion is promising for this challenge. In this paper, we apply relevance models, a type of query expansion method, for the spoken document passage retrieval task. We adapted the original relevance model for passage retrieval. We also extended it to benefit ...
متن کاملSpoken document retrieval method combining query expansion with continuous syllable recognition for NTCIR-SpokenDoc
In this paper, we propose a spoken document retrieval method which combines query expansion with continuous syllable recognition. The proposed method expands a query by using words from the web pages collected by a search engine. It is assumed that relevant document vectors exist on the plane which is constructed from the query vector and the extended vector. The weight parameter between a targ...
متن کاملToward improvement of SDR accuracy using LDA and query expansion for SpokenDoc
This paper investigates several techniques for spoken document retrieval, toward improvement of retrieval performance based on the conventional method i.e. TF-IDF. The first approach employs rescaled unigrams of LDA to compute a similarity score. The second technique employs query expansion by web retrieval using Yahoo!API. And the third technique is Prioritized And-operator Retrieval based on ...
متن کاملOpen-vocabulary spoken-document retrieval based on query expansion using related web documents
This paper proposes a new method for open-vocabulary spoken-document retrieval based on query expansion using related Web documents. A large vocabulary continuous speech recognition (LVCSR) system first transcribes spoken documents into word sequences, which are then segmented into semantically cohesive units (i.e., stories) using a text segmentation technique. Given a text query word, Web docu...
متن کاملPhonetic query expansion for spoken document retrieval
We are interested in retrieving information from speech data using phonetic search. We show improvement by expanding the query phonetically using a joint maximum entropy N-gram model. The value of this approach is demonstrated on Broadcast News data from NIST 2006 Spoken Term Detection evaluation.
متن کامل